<article scheme=default title="Texture mapping, part 3" author="Bonz">

<p align=c fl=0>
<font style=headline_def>
Texture mapping, part 3
</font>
<p>
<i>
<link external="mailto:bonzini@gnu.org">
Bonz
</link>
</i>

<p align=j spacing=14>
Hi, this is the third and final installment of my three-part series on
texture mapping.  After the first part (by Boreal) covered affine texture
mapping (which is basically a 2D technique applied to 3D, and as such
often distorts textures in spectacular ways), and the second part (by me)
covered perspective-correct texture mapping, this article will cover
approximated techniques which results in better results than affine
texturing and better speed than perspective-correct mapping; I'll also
cover briefly mipmapping.

<p spacing 16>
The five techniques that I'll analyze are <i>constant-Z</i> mapping,
<i>linear approximation</i>, <i>scanline subdivision</i>, <i>parabolic
approximation</i>, and Bresenham (aka <i>hyperbolic</i> texture mapping).

<p>
Before starting, let me remind you of the conventions I'm using: <i>X</i>
and <i>Y</i> are screen coordinates, <i>u</i> and <i>v</i> are texture
coordinates, <i>xyz</i> are 3-D coordinates; finally <i>m</i> and
<i>n</i> are two "magic coordinates", linear in screen space, such that
<i>u=m/(1/z)</i> and <i>v=n/(1/z)</i>.


<p align=j spacing=15>
<font style subheadline_def>
Constant-Z mapping
</font>

<p spacing 16>
Constant-Z is based on a very interesting remark.  Since, as we saw,
<i>1/z</i> values are linear in screen-space with the following formula

<p>
<font style code>
<pre>         1/z = -a/d X - b/d Y - c/d
</pre>
</font>

<p>
lines with a slope of <i>-a/b</i> have a constant z: substitute

<p>
<font style code>
<pre>         Y = -a/b X + Y0
</pre>
</font>

<p>
and you have

<p>
<font style code>
<pre>         1/z = -a/d X + a/d X - c/d + Y0 = Y0 - c/d
</pre>
</font>

<p>
Hence, along these lines, <i>u</i> and <i>v</i> vary linearly in
screen-space with these delta values:

<p>
<font style code>
<pre>         du = dm / (1/z)
         dv = dn / (1/z)
</pre>
</font>

<p>
The basic algorithm then looks like this:

<p>
<font style code>
<pre>         <i>for each constant-Z line in the polygon</i>
             <i>compute u, v, dm, dn as in part 2</i>
             du = dm / (1/z)
             dv = dn / (1/z)
             <i>for each point on</i> Y = -a/b (X - X0) + Y0
                <i>... plot the (u,v) texel at (X,Y)...</i>
                u = u + du
                v = v + dv
</pre>
</font>

<p>
There are many problems though with this simple approach.  The most
evident is that you should take care of avoiding holes in the lines.
There is an easy way to do this: use a Bresenham line scanner,
(like the one that elmm/[KCN] explained in Hugi 24) and set the
initial error as if you always scanned the lines from X=0 (if the line
is scanned horizontally, that is if <i>|a/b|&lt;1</i>) or from Y=0 if the
line is scanned vertically, that is if <i>|a/b|&gt;1</i>).  The result is
that each scanline is exactly the same as the previous one, only translated
vertically.

<p>
So the inner loop becomes

<p>
<font style code>
<pre>        if |a/b|<1
            error = X0 * |a| % |b|
            for each X on the scanline
                <i>... plot the (u,v) texel at (X,Y)...</i>
                error = error + |a|
                u = u + du
                v = v + dv
                if error >= |b| then
                    error = error - |b|
                    Y = Y + sgn(a/b)        <i>// - if Y's grow upwards</i>
        else
            error = X0 * |b| % |a|
            for each Y on the scanline
                <i>... plot the (u,v) texel at (X,Y)...</i>
                error = error + |b|
                u = u + du
                v = v + dv
                if error >= |b| then
                    error = error - |a|
                    X = X + sgn(a/b)        <i>// - if Y's grow upwards</i>
</pre>
</font>

<p>
Two other problems are less evident.  Just look at these examples
of how (intuitively) lines should be scanned

<p>
<font style code>
<pre>            1111                 45
           12222333             34567
          223333444455         23456789
         333444455            123456789ab
        444455               123456789abcde
       455                  0123456
</pre>
</font>

<p>
The problem with the first example is that there are holes in
the line labeled <i>5</i>.  This is most easily solved in the Bresenham
inner loop, by checking if the X is in range and not plotting
the pixel if so.

<p>
The problem with both examples is that I did not specify how to iterate
through all constant-z lines.  In the first example a reasonable choice 
seems to be to pick the leftmost pixel of each line and scan to the right;
in the second example it seems to be to pick the bottom pixel 
and scan upwards.  There are of course two symmetrical cases in which
you should pick the rightmost point and scan to the left, or pick the
topmost point and scan to the right.  This complicates things a lot in
the implementation of this algorithm.

<p>
And the result is not that good anyway, because the lines are not
exactly constant-z.  Since each of the two screen coordinates might be
off by up to half a pixel with respect to the ideal constant-z line, the
value of <i>1/z</i> might be off by up to <i>a/2d</i> or <i>b/2d</i>.
So, despite its promises, the algorithm is not correct for lines
that are not horizontal, vertical or 45-degrees.

<p>
So, my position is: stay away from constant-Z.


<p align=j spacing=15>
<font style subheadline_def>
Approximating perspective-correct texture mapping
</font>

<p spacing 16>
Since mathematical tricks did not result in a good algorithm, let's
assume that the only way to do really correct texture mapping is to
loop on horizontal scanlines and do two divides per pixel.  To
save them, we should simply draw an approximation, though not one
as brutal as affine texture mapping.

<p>
The four approximations that I will present are: a segment, many
segments, a parabola, and a Bresenham-scanned hyperbola.

<p>
The graph below shows the linear approximation, the parabolic
approximation, and the original hyperbola.

<p align c>
<link desc "Two approximations of an hyperbola">
<image file images/bonz-tmap3.bmp>
</link>


<p align=j spacing=15>
<font style subheadline_def>
Two kind of linear approximations
</font>

<p align j spacing 16>
This is quite a brutal way of approximating the hyperbola, as can be seen
>from the graph.  

<p>
This simply involves calculating the texture coordinates at both ends of each
scan line and linearly interpolating them in between.  This works well when
there is a small angle between the view direction and the normal of the object
being rendered, because in these cases the scanlines closely approximate
constant-Z lines.  However, as the angle increases the object starts to look
distorted.

<p>
Just compute the <i>u</i> and <i>v</i> values at the two endpoints and
interpolate in between:

<p>
<font style code>
<pre>    for each polygon
        compute A, B, C, D, E, F as in part 2
        scan convert it
        for each scanline Y
            X1 = leftmost point to be plotted (inclusive)
            X2 = rightmost point to be plotted (inclusive)

            <i>// compute 1/z and 3D x, y at the endpoints</i>
            zInv1 = -a/d * X1 - b/d * Y - c/d
            zInv2 = -a/d * X2 - b/d * Y - c/d
            x1 = X1 / zInv1
            x2 = X2 / zInv2
            y1 = Y1 / zInv1
            y2 = Y2 / zInv2

            <i>// compute u, v at the endpoints and the interpolation step</i>
            u1 = A * x1 + B * y1 + C
            v1 = D * x1 + E * y1 + F
            u2 = A * x2 + B * y2 + C
            v2 = D * x2 + E * y2 + F
            du = (u2 - u1) / (x2 - x1)
            dv = (v2 - v1) / (x2 - x1)

            for X = X1 to X2
                <i>... plot the (u,v) texel at (X,Y)...</i>
                u = u + du
                v = v + dv
</pre>
</font>

<p>
The result as you can imagine is quite poor.  But it can be useful
for small triangles on which the viewer is likely not to notice the
distortions that the algorithm introduces.  An even better way is to
use piece-wise interpolations, each of which is at most 16 pixels long.

<p>
This code, lifted from part 2, is the basis for the following
algorithm:

<p>
<font style code>
<pre>    for each polygon
        compute A, B, C, D, E, F and other common values
        scan convert it
        for each scanline Y
            X1 = leftmost point to be plotted (inclusive)
            X2 = rightmost point to be plotted (inclusive)
            zInv = -a/d * X - b/d * Y - Z
            m1 = (A * X + B * Y + C) * zInv
            n1 = (D * X + E * Y + F) * zInv
            dm = A - C * a/d
            dn = D - F * a/d
            <i>... plot a scanline ...</i>
</pre>
</font>

<p>
The "plot a scanline part", in the case of piecewise linear interpolations,
goes like this:

<p>
<font style code>
<pre>    dm = dm * 16
    dn = dn * 16
    u = m / zInv
    v = n / zInv
    for X0 = X1 to X2 step 16
        <i>// Move the magic coordinates by big steps</i>
        m = m + dm
        n = n + dn
        zInv = zInv - 16 * a/d
        du = (m / zInv - u) / 16
        dv = (n / zInv - v) / 16

        <i>// Interpolate the texture coordinates linearly</i>
        for X = X0 to min(X0+15,X2)
            <i>... plot the (u,v) texel at (X,Y)...</i>
            u = u + du
            v = v + dv
</pre>
</font>

<p>
The results are quite good, so this is often the algorithm of choice
to implement texture mapping.  The algorithm can also be made adaptive
(that is, you can render surfaces which require to be more accurate
with 8 pixel wide interpolations, while others which can be less
precise can be done with 16-byte scan lengths).


<p align=j spacing=15>
<font style subheadline_def>
Parabolic approximation
</font>

<p spacing 16>
Instead of a line, using a parabola can give more appealing results.
The algorithm can be deduced from observing that the second differences
(<b>not</b> <i>derivatives</i>!) of parabolas are constant:

<p>
<font style code>
<pre>    <font face "Symbol">D</font>y = A (x+1)^2 + B (x+1) + C - Ax^2 - Bx - C = 2x A + A + B
    <font face "Symbol">DD</font>y = 2 (x+1) A + A + B - 2x A - A - B = 2A
</pre>
</font>

<p>
To plot a parabola you can use the following inner loop:

<p>
<font style code>
<pre>    dy = 2*x1*A + A + B
    ddy = 2*A
    y = A*x1^2 + B*x1 + C
    for x = x1 to x2
        <i>plot (x,y)</i>
        y = y + dy
        dy = dy + ddy
</pre>
</font>

<p>
which requires two additions per pixel, just twice the cost of the linear
approximation.  The math to do the interpolation is boring but easy.  We
need to solve a three-unknowns linear system like the one that we solved
(in part 2) to obtain texture coordinates for arbitrary points of the
polygon.  If a scanline is <i>2k</i> pixels long, the system to be solved is

<p>
<font style code>
<pre>    a 0^2 + b 0 + c = u1
    a k^2 + b k + c = u2
    a (2k)^2 + b (2k) + C = u3
</pre>
</font>

<p>
and its solution is

<p>
<font style code>
<pre>    a = (u3 - 2*u2 + u1) / 2k^2
    b = (4*u2 - u3 - 3*u1) / 2k
    c = u1
</pre>
</font>

<p>
Now, plugging this into the formulas for the first and second differences
above, we have at the start of the loop

<p>
<font style code>
<pre>    ddu = 2a = (u3 - 2*u2 + u1) / k^2
    du = a + b = ddu/2 + (4*u2 - u3 - 3*u1) / 2k
    u = u1
</pre>
</font>

<p>
and similarly for dv.  So we have

<p>
<font style code>
<pre>    for each polygon
        compute A, B, C, D, E, F as in part 2
        scan convert it
        for each scanline Y
            X1 = leftmost point to be plotted (inclusive)
            X3 = rightmost point to be plotted (inclusive)
            k  = (X3 - X1) / 2
            X2 = X1 + k

            <i>// compute 1/z and 3D x, y at the three interpolation points</i>
            zInv1 = -a/d * X1 - b/d * Y - c/d
            zInv2 = zInv1 - a/d * k
            zInv3 = zInv2 - a/d * k
            x1 = X1 / zInv1
            x2 = X2 / zInv2
            x3 = X3 / zInv3
            y1 = Y1 / zInv1
            y2 = Y2 / zInv2
            y3 = Y3 / zInv3

            <i>// compute u, v at the interpolation points</i>
            u1 = A * x1 + B * y1 + C
            v1 = D * x1 + E * y1 + F
            u2 = u1 + A * k
            v2 = v1 + D * k
            u3 = u2 + A * k
            v3 = v2 + D * k

            <i>// compute the first and second differences</i>
            ddu = (u3 - 2*u2 + u1) / (k*k)
            du = (4*u2 - u3 - 3*u1) / (2*k)
            u = u1

            ddv = (v3 - 2*v2 + v1) / (k*k)
            dv = (4*v2 - v3 - 3*v1) / (2*k)
            v = v1

            for X = X1 to X2
                <i>... plot the (u,v) texel at (X,Y)...</i>
                u = u + du
                v = v + dv
                du = du + ddu
                dv = dv + ddv
</pre>
</font>

<p align=j spacing=15>
<font style subheadline_def>
Bresenham scanning for texture mapping
</font>

<p spacing 16>
This is the application of Bresenham's algorithms for conic sections.
As you should know from elmm's Hugi 24 article, Bresenham's approach
is to add or subtract 1 to one or both coordinates at each step, tracking
the error along the way so that you know what is your best move at each step.
So you don't have only Bresenham's algorithm for lines, circles and ellipses,
but also a more general variation for arbitrary conics.  This however is
really complicated for our purposes, so we will derive it from scratch
and obtain an incomplete conic drawer which perfectly fits our needs.

<p>
The formula for computing texture coordinates,

<p>
<font style code>
<pre>         u = (dm * (X-X1) + m1) / (-a/d * (X-X1) + zInv1)
</pre>
</font>

<p>
can be rewritten like this in implicit form

<p>
<font style code>
<pre>         f(X, u) = 
             = -a/d * (X-X1) * u + zInv1 * u - dm * (X-X1) - m1 =
             = -a/d * X * u + (a/d * X1 + zInv1) * u - dm * X - m1 + dm*X1 = 0
</pre>
</font>

<p>
and what we'll have to do is to move <i>X</i> and <i>u</i> to keep the
rightmost term as close to zero as possible.

<p>
Let's see what happens when we change <i>X</i> and <i>u</i>: again, the second
differences are constant (this time, they are mixed second differences).

<p>
<font style code>
<pre>    (<font face "Symbol">D</font>f)X = -a/d * u - dm
    (<font face "Symbol">D</font>f)u = -a/d * X + (a/d * X1) + zInv1
    (<font face "Symbol">D</font>f)Xu = -a/d
    (<font face "Symbol">D</font>f)uX = -a/d
</pre>
</font>

<p>
<i>(<font face "Symbol">D</font>f)X</i> is how <i>f</i>
varies when <i>X</i> is incremented; 
<i>(<font face "Symbol">D</font>f)Xu</i> is how
(<font face "Symbol">D</font>f)X</i> varies when <i>u</i>
is incremented; and so on.

<p>
The Bresenham algorithm for drawing the full hyperbola should draw each octant
separately, but since we are only interested in an arc of the hyperbola, we can
make it clearer and way more efficient by carefully observing the signs of the
differences.  Writing the function as

<p>
<font style code>
<pre>                                         A+BX
    f(X, u) = A+BX+Cu+DXu = 0        u = ----
                                         C+DX
                                 
    (<font face "Symbol">D</font>f)X = Du+B
    (<font face "Symbol">D</font>f)u = DX+C
                 
</pre>
</font>

<p>
we notice that the sign of <i>(<font face "Symbol">D</font>f)u</i> is
constant, otherwise the denominator would cross the zero and <i>u</i> would go
in the wild.  It must also be positive, since both <i>u</i> and the numerator are
positive (the numerator also cannot change sign and, in the full formula for <i>u</i>,
<i>A=m1</i> which is positive).

<p>
With <i>(<font face "Symbol">D</font>f)u</i>&gt;0,
increasing values of <i>u</i> tend to increase <i>f</i> and decreasing values
of <i>u</i> tend to decrease <i>f</i>.  Hence, the sign of
<i>(<font face "Symbol">D</font>f)x</i> must compensate this
variation: if on a given scanline <i>u2-u1&gt;0</i>,
<i>(<font face "Symbol">D</font>f)x</i> will be negative, and
of course positive if <i>u2-u1&lt;0</i>.

<p>
All this, to say that the sequence of <i>u</i> will be monotonic and that
we can extract many <i>if</i>s from the inner loop, which then becomes:

<p>
<font style code>
<pre>    X1 = leftmost point to be plotted (inclusive)
    X2 = rightmost point to be plotted (inclusive)

    ddf = -a/d
    dfX = -a/d * u1 - dm
    dfu = zInv1
    f = dfu * u1 - m1
    X = X1
    if u2>u1 then du=1 else du=-1, dfu=-dfu
    
    <i>// loop for increasing values of u</i>
    loop:
        fX = f + dfX
        fu = f + dfu
        if abs(fX) &lt; abs(fu) then
            if X = X2 then exit loop
            X = X + 1
            f = fX
            dfu = dfu + ddf
        else
            u = u + du
            f = fu
            dfX = dfX + ddf
</pre>
</font>

<p>
This gives a series of points: at every step one coordinate varies by 1
and the other does not change.  To make this a rendering algorithm
we must combine two Bresenham scans (one for <i>u</i> and one for <i>v</i>),
as in the following code.

<p>
<font style code>
<pre>    X1 = leftmost point to be plotted (inclusive)
    X2 = rightmost point to be plotted (inclusive)

    <i>// compute the differences for the two hyperbolas</i>
    dd = -a/d
    dfX = -a/d * u1 - dm
    dgX = -a/d * v1 - dn
    dfu = zInv1
    dgv = zInv1
    
    <i>// compute the initial errors</i>
    f = dfu * u1 - m1
    g = dfv * v1 - n1
    X = X1
    if u2>u1 then du = 1 else du = -1, dfu = -dfu
    if v2>v1 then dv = 1 else dv = -1, dgv = -dgv

    loop:
        fX = f + dfX
        fu = f + dfu
        if abs(fX) &lt; abs(fu) then
            <i>// Advance g(X,v) to the best v for the next X</i>
            gX = g + dgX
            gv = g + dgv
            while abs(gX) &gt; abs(gv) do
                v = v + dv
                g = gv
                dgX = dgX + dd
                gX = g + dgX
                gv = g + dgv

            <i>... plot the (u,v) texel at (X,Y)...</i>

            if X = X2 then exit loop
            X = X + 1
            f = fX
            g = gX
            dfu = dfu + dd
            dgv = dgv + dd
        else
            u = u + du
            f = fu
            dfX = dfX + dd
</pre>
</font>

<p>
There is a problem with this algorithm: the complexity of other approximation
was the length of the scanline, while the complexity of the Bresenham algorithm
is the length of the scanline <i>plus</i> the difference between the values of
the texture coordinates at the endpoints.  So, this is particularly losing when
the triangles are far from the viewer, because the payload then gets really
small: the complexity becomes linked to the number of texture coordinates to
be skipped, rather than (as would be desirable) to the number of pixels plotted.

<p>
The answer can be to switch from linear to parabolic to Bresenham approximations
depending on the distance of the object from the viewer, but another very
attractive approach is...


<p align=j spacing=15>
<font style subheadline_def>
Mip-mapping
</font>

<p spacing 16>
When playing Wolfenstein or Doom, you will probably have noticed some flickering
artifacts in the farthest walls when you moved forwards/backwards.  This is due to
aliasing: the choice of which pixel to plot for the smallest textures is quite random
when you have to pick only ten or so pixels out of 256 -- one time it might be the
black texel at coordinates (25,32), the next frame it might be the red one at
coordinates (35,40), and so on.

<p>
The way to go is to compute different versions of the texture (the <i>mipmaps</i>), 
each one is the same as the original texture, but scaled by some factor.  Since you
never skip pixels in your texture, you can exploit Bresenham texture mapping to
its best, and this also prevents details on the texture from flashing, so the image
quality and stability goes up. 

<p>
The total memory needed for a full set of mipmaps is only 4/3 the size of the
original texture: this is because <i>1+1/4+1/16+1/64+... = 4/3</i>.  This is
probably less than you would have expected.  When doing mipmapping, the hardest
thing to do is to determine which mipmap to use; a simple yet very precise way
is to compute the ratio between the size of the texture and the size of the
triangle to be drawn on screen.  The formula for the area of a triangle is

<p>
<font style code>
<pre>    | 1      / x1 y1 1 \  |   | (x2-x1) (y3-y1) - (x3-x1) (y2-y1) |
    | - det |  x2 y2 1  | | = -------------------------------------
    | 2      \ x3 y3 1 /  |                     2
</pre>
</font>

<p>
where the bars stand for the absolute value; so the mipmap factor will be

<p>
<font style code>
<pre>    | (u2-u1) (v3-v1) - (u3-u1) (v2-v1) |
    | --------------------------------- |
    | (x2-x1) (y3-y1) - (x3-x1) (y2-y1) |
</pre>
</font>

<p>
when the factor is 4 we should already be using the second mipmap because we're
already skipping every other pixel.  When it's 16 we should be using the third,
and so on.  For example, you can switch at 3, 12, 48, etc. (just experiment and
see what pleases you the most).


<p align=j spacing=15>
<font style subheadline_def>
Conclusion
</font>

<p spacing 16>
I hope you understand texture mapping and different ways to implement it.  I
like Bresenham texture mapping a lot when coupled with mipmapping, yet it is
not easy to find a complete tutorial on it -- so this is not entirely stuff
seen too many times already.

<p>
This stuff did not become completely obsolete with the advent of acceleration;
having a good texture mapper can get that +5% speed that you needed out of
your raytracer, for example.

<p>
Have fun, and let me know if you use the info contained in this tutorial!

<p align l>
<font style code>
<pre>--
|_  _  _ __
|_)(_)| ),'
-------  `-.
</pre>
</font>
